Search CORE

86 research outputs found

Detecting Sarcasm in Multimodal Social Platforms

Author: Bamman D.
Davidov D.
Frome A.
Ghosh D.
Gibbs R.
González-Ibánez R.
Kincaid J. P.
Mikolov T.
Riloff E.
Tepperman J.
Tsur O.
Veale T.
Verstraten P.
Wang Z.
You Q.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Sarcasm is a peculiar form of sentiment expression, where the surface sentiment differs from the implied sentiment. The detection of sarcasm in social media platforms has been applied in the past mainly to textual utterances where lexical indicators (such as interjections and intensifiers), linguistic markers, and contextual information (such as user profiles, or past conversations) were used to detect the sarcastic tone. However, modern social media platforms allow to create multimodal messages where audiovisual content is integrated with the text, making the analysis of a mode in isolation partial. In our work, we first study the relationship between the textual and visual aspects in multimodal posts from three major social media platforms, i.e., Instagram, Tumblr and Twitter, and we run a crowdsourcing task to quantify the extent to which images are perceived as necessary by human annotators. Moreover, we propose two different computational frameworks to detect sarcasm that integrate the textual and visual modalities. The first approach exploits visual semantics trained on an external dataset, and concatenates the semantics features with state-of-the-art textual features. The second method adapts a visual neural network initialized with parameters trained on ImageNet to multimodal sarcastic posts. Results show the positive effect of combining modalities for the detection of sarcasm across platforms and methods.Comment: 10 pages, 3 figures, final version published in the Proceedings of ACM Multimedia 201

arXiv.org e-Print Archive

Crossref

Institutional Research Information System University of Turin

Using Support Vector Machines for Terrorism Information Extraction

Author: E. Riloff
J.-T. Kim
S. Baluja
S. Soderland
V. N. Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2003
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Classifying Organizations for Food System Ontologies using Natural Language Processing

Author: Earl E. Louise
Hollander Allan D.
Huber Patrick R.
Jiang Tianyu
Lange Matthew
Riloff Ellen
Schillo R. Sandra
Stringham Nathan
Ubbiali Giorgio A.
Vinogradova Sonia
Publication venue
Publication date: 19/09/2023
Field of study

Our research explores the use of natural language processing (NLP) methods to automatically classify entities for the purpose of knowledge graph population and integration with food system ontologies. We have created NLP models that can automatically classify organizations with respect to categories associated with environmental issues as well as Standard Industrial Classification (SIC) codes, which are used by the U.S. government to characterize business activities. As input, the NLP models are provided with text snippets retrieved by the Google search engine for each organization, which serves as a textual description of the organization that is used for learning. Our experimental results show that NLP models can achieve reasonably good performance for these two classification tasks, and they rely on a general framework that could be applied to many other classification problems as well. We believe that NLP models represent a promising approach for automatically harvesting information to populate knowledge graphs and aligning the information with existing ontologies through shared categories and concepts.Comment: Presented at IFOW 2023 Integrated Food Ontology Workshop at the Formal Ontology in Information Systems Conference (FOIS) 2023 in Sherbrooke, Quebec, Canada July 17-20th, 202

arXiv.org e-Print Archive

Dynamic summarization of bibliographic-based data

Author: A Haase
A Yamanashi
AR Aronson
C Fraser
C Sneiderman
CB Ahlers
DA Lindberg
DB Johnson
DL Sackett
E Riloff
E Riloff
H Kilicoglu
John F Hurdle
M Fiszman
M Fiszman
M Fiszman
MA Rochester
ML Chambliss
R Khoury
S Cole
S Golder
S Karimi
S Kullback
S Peri
T Bekhuis
T Elizabeth Workman
TC Rindflesch
TE Workman
U Hahn
WR Hersh
Y Lin
Y Niu
Publication venue: BioMed Central
Publication date: 01/02/2011
Field of study

Abstract Background Traditional information retrieval techniques typically return excessive output when directed at large bibliographic databases. Natural Language Processing applications strive to extract salient content from the excessive data. Semantic MEDLINE, a National Library of Medicine (NLM) natural language processing application, highlights relevant information in PubMed data. However, Semantic MEDLINE implements manually coded schemas, accommodating few information needs. Currently, there are only five such schemas, while many more would be needed to realistically accommodate all potential users. The aim of this project was to develop and evaluate a statistical algorithm that automatically identifies relevant bibliographic data; the new algorithm could be incorporated into a dynamic schema to accommodate various information needs in Semantic MEDLINE, and eliminate the need for multiple schemas. Methods We developed a flexible algorithm named Combo that combines three statistical metrics, the Kullback-Leibler Divergence (KLD), Riloff's RlogF metric (RlogF), and a new metric called PredScal, to automatically identify salient data in bibliographic text. We downloaded citations from a PubMed search query addressing the genetic etiology of bladder cancer. The citations were processed with SemRep, an NLM rule-based application that produces semantic predications. SemRep output was processed by Combo, in addition to the standard Semantic MEDLINE genetics schema and independently by the two individual KLD and RlogF metrics. We evaluated each summarization method using an existing reference standard within the task-based context of genetic database curation. Results Combo asserted 74 genetic entities implicated in bladder cancer development, whereas the traditional schema asserted 10 genetic entities; the KLD and RlogF metrics individually asserted 77 and 69 genetic entities, respectively. Combo achieved 61% recall and 81% precision, with an F-score of 0.69. The traditional schema achieved 23% recall and 100% precision, with an F-score of 0.37. The KLD metric achieved 61% recall, 70% precision, with an F-score of 0.65. The RlogF metric achieved 61% recall, 72% precision, with an F-score of 0.66. Conclusions Semantic MEDLINE summarization using the new Combo algorithm outperformed a conventional summarization schema in a genetic database curation task. It potentially could streamline information acquisition for other needs without having to hand-build multiple saliency schemas.</p

Crossref

Directory of Open Access Journals

PubMed Central

Sarcasm detection using machine learning algorithms in Twitter: A systematic review

Author: Ahmed Ibrahim Alzahrani
Bali T.
Bianca Wright
Das R.
Davidov D.
González-Ibánez R.
Hosam Al-Samarraie
Kovaz D.
Liu B.
Nayak A. S.
Ptáček T.
Rani S. R.
Riloff E.
Samer Muthana Sarsam
Tungthamthiti P.
Zhang M.
Publication venue: 'SAGE Publications'
Publication date: 01/09/2020
Field of study

Crossref

Coventry University Pure Portal

Collective emotions online and their influence on community life

Author: A Chmiel
A Chmiel
A Chmiel
A Czaplicka
A Kappas
A Tumasjan
A-L Barabási
A-L Barabási
AJ Gerber
Anna Chmiel
Arvid Kappas
Attila Szolnoki
B Kujawski
B Pang
BA Huberman
C Castellano
C Darwin
C Macdonald
F Radicchi
F Schweitzer
F Sebastiani
G Paltoglou
G Paltoglou
Georgios Paltoglou
H Rheingold
J Posner
J Suler
J Walther
J-P Onnela
Janusz A. Hołyst
Julian Sienkiewicz
Kevan Buckley
LA Feldman
M Gamon
M Mitrović
M Mitrović
M Mitrović
M Skowron
M Szell
M Taboada
Mike Thelwall
NH Frijda
P Krapivsky
P Krapivsky
P Sobkowicz
PJ Lang
PS Dodds
R Reisenzein
RB Zajonc
Riloff E
RIM Dunbar
S Gobron
SH Hemenover
T Wilson
W James
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/07/2011
Field of study

E-communities, social groups interacting online, have recently become an object of interdisciplinary research. As with face-to-face meetings, Internet exchanges may not only include factual information but also emotional information - how participants feel about the subject discussed or other group members. Emotions are known to be important in affecting interaction partners in offline communication in many ways. Could emotions in Internet exchanges affect others and systematically influence quantitative and qualitative aspects of the trajectory of e-communities? The development of automatic sentiment analysis has made large scale emotion detection and analysis possible using text messages collected from the web. It is not clear if emotions in e-communities primarily derive from individual group members' personalities or if they result from intra-group interactions, and whether they influence group activities. We show the collective character of affective phenomena on a large scale as observed in 4 million posts downloaded from Blogs, Digg and BBC forums. To test whether the emotions of a community member may influence the emotions of others, posts were grouped into clusters of messages with similar emotional valences. The frequency of long clusters was much higher than it would be if emotions occurred at random. Distributions for cluster lengths can be explained by preferential processes because conditional probabilities for consecutive messages grow as a power law with cluster length. For BBC forum threads, average discussion lengths were higher for larger values of absolute average emotional valence in the first ten comments and the average amount of emotion in messages fell during discussions. Our results prove that collective emotional states can be created and modulated via Internet communication and that emotional expressiveness is the fuel that sustains some e-communities.Comment: 23 pages including Supporting Information, accepted to PLoS ON

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

Machine Learning in Automated Text Categorization

Author: ANDROUTSOPOULOS I.
ATTARDI G.
BAKER L.D.
BIEBRICHER P.
CAROPRESO M.F.
CAVNAR W.B.
CHAKRABARTI S.
CLACK C.
CLEVERDON C.
COHEN W. W.
COHEN W. W.
COHEN W.W.
DAGAN I.
DEERWESTER S.
DENOYER L.
DIAZ ESTEBAN A.
DRUCKER H.
DUMAIS S.T.
DUMAIS S.T.
ESCUDERO G.
Fabrizio Sebastiani
FIELD B.
FORSYTH R. S.
FUHR N.
FUHR N.
FUHR N.
FURNKRANZ J.
GALAVOTTI L.
GALE W. A.
GOVERT N.
GRAY W.A.
GUTHRIE L.
HAYES P.J.
HEAPS H.
HERSH W.
HULL D. A.
HULL D. A.
ITTNER D.J.
IWAYAMA M.
IYER R.D.
JOACHIMS T.
JOACHIMS T.
JOACHIMS T.
JOHN G. H.
JUNKER M.
JUNKER M.
KESSLER B.
KIM Y.-H.
KLINKENBERG R.
KNORZ G.
KOLLER D.
LAM S.L.
LAM W.
LAM W.
LANG K.
LARKEY L. S.
LARKEY L. S.
LARKEY L.S.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LI H.
LI Y.H.
LIERE R.
LIM J. H.
MASAND B.
MASAND B.
MCCALLUM A. K.
MCCALLUM A.K.
MLADENIC D.
MLADENIC D.
MOULINIER I.
MOULINIER I.
MYERS K.
NG H.T.
OH H.-J.
PAZIENZA M. T.
RILOFF E.
ROBERTSON S.E.
ROBERTSON S.E.
ROTH D.
RUIZ M.E.
SABLE C.L.
SARACEVIC T.
SCHAPIRE R. E.
SCHUTZE H.
SCHUTZE H.
SCOTT S.
SEBASTIANI F.
SINGHAL A.
SLONIM N.
TAIRA H.
TUMER K.
TZERAS K.
VAN RIJSBERGEN C. J.
WIENER E.D.
YANG Y.
YANG Y.
YANG Y.
YANG Y.
YU K.L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2001
Field of study

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

arXiv.org e-Print Archive

CiteSeerX

Crossref

Discovering gene annotations in biomedical text databases

Author: A Cakmak
Ali Cakmak
Burr Settles
Chin-Yew Lin
Deepak Ravichandran
DV Kalashnikov
E Camon
Ellen Riloff
Eugene Agichtein
G Salton
Gideon S Mann
Gultekin Ozsoyoglu
Jiawei Han
JoonHo Lee
K Asakawa
K Asako
KarenSparck Jones
L Lovasz
Michael Fleischman
Michael Fleischman
Oren Etzioni
Philip Resnik
PW Lord
Roy Rada
S Raychaudhuri
S White
Sergey Brin
Sergey Brin
The Gene Ontology Consortium
Tomonori Izumitani
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Genes and gene products are frequently annotated with Gene Ontology concepts based on the evidence provided in genomics articles. Manually locating and curating information about a genomic entity from the biomedical literature requires vast amounts of human effort. Hence, there is clearly a need forautomated computational tools to annotate the genes and gene products with Gene Ontology concepts by computationally capturing the related knowledge embedded in textual data. Results In this article, we present an automated genomic entity annotation system, GEANN, which extracts information about the characteristics of genes and gene products in article abstracts from PubMed, and translates the discoveredknowledge into Gene Ontology (GO) concepts, a widely-used standardized vocabulary of genomic traits. GEANN utilizes textual "extraction patterns", and a semantic matching framework to locate phrases matching to a pattern and produce Gene Ontology annotations for genes and gene products. In our experiments, GEANN has reached to the precision level of 78% at therecall level of 61%. On a select set of Gene Ontology concepts, GEANN either outperforms or is comparable to two other automated annotation studies. Use of WordNet for semantic pattern matching improves the precision and recall by 24% and 15%, respectively, and the improvement due to semantic pattern matching becomes more apparent as the Gene Ontology terms become more general. Conclusion GEANN is useful for two distinct purposes: (i) automating the annotation of genomic entities with Gene Ontology concepts, and (ii) providing existing annotations with additional "evidence articles" from the literature. The use of textual extraction patterns that are constructed based on the existing annotations achieve high precision. The semantic pattern matching framework provides a more flexible pattern matching scheme with respect to "exactmatching" with the advantage of locating approximate pattern occurrences with similar semantics. Relatively low recall performance of our pattern-based approach may be enhanced either by employing a probabilistic annotation framework based on the annotation neighbourhoods in textual data, or, alternatively, the statistical enrichment threshold may be adjusted to lower values for applications that put more value on achieving higher recall values.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Semantic annotation of morphological descriptions: an overall strategy

Author: A Taylor
D Kirkup
E Riloff
G Curry
G Diggs
G Sautter
H Cui
H Cui
H Cui
H Cui
H Cui
H Cui
Hong Cui
MM Wood
R Abascal
S Lydon
S Soderland
X Tang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Benchmarking Ontologies: Bigger or Better?

Author: A Faatz
A Gangemi
A Gomez-Perez
A Gómez-Pérez
A Mädche
A Mädche
A Rzhetsky
A Spooner
Andrey Rzhetsky
Anna Divoli
AR Aronson
AR Aronson
AR Aronson
AT McCray
AT McCray
AT McCray
AT McCray
AT McCray
B Smith
BA Kipfer
C Brewster
C Brewster
C Brewster
C Laird
C Rosse
CE Lipscomb
CJ Bult
CL Smith
D Lin
D Maynard
DL Cook
E Riloff
FB Rogers
G Jurasinski
G Miller
I Scholastic
I Sim
Ilya Mayzus
J Brank
J Devlin
J Evermann
J Yu
JA Blake
James A. Evans
JC Park
JI Rodale
JR Firth
JS Justeson
K Dellschaft
K Toutanova
K Toutanova
K Verspoor
K Verspoor
K. Bretonnel Cohen
KB Cohen
Lixia Yao
LM Spencer
M Ashburner
M Grüninger
M Minsky
M Missikoff
M Sabou
N Guarino
O Bodenreider
P Buitelaar
P Cimiano
PD Karp
R Cornet
R Navigli
S Hyun
S Kiritchenko
S Schulz
S York
S Zhang
SH Brown
TR Gruber
U Hahn
V Walden
W Ceusters
Y Sure
Z Harris
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central